As part of the UK pandemic response, we develop methods to identify and monitor emerging variants of interest. Your job today is to look at clusters of interest, which may have sequences with specific lineages or lineage and mutation combinations, and report on growth rates. We will look at outputs from the scanning tool designed for United Kingdom Health Security Agency (UKHSA), formerly known as Public Health England (PHE). These images and data come from work conducted as part of the SARS-CoV-2 pandemic response in the UK to monitor emerging variants of interest or concern in real time.
Imagine the date is the 28th of May, 2021. PHE has requested a situation report on the top growth clusters of interest, and more specifically on the top growth rates of B.1.617.2 (Delta) in the country as this is a relatively newly emerged VOC.
Use the practical period to review the figures and discuss with your classmates discussion prompts as they come up. This report should be about three pages in length, and include a few figures or tables from the scanning tool as part of the report.
Factors to consider when discussing with your classmates:
Additionally, remember that we are only including P2 (lighthouse or community) samples in our scanning tool and outputs. This is always good to mention. Why? Because different sampling methods can lead to sample bias in our results.
Discussion prompt
N.B You can choose to develop a situation report based on your discussion today and include this report as part of the SuS Revolutions in Biomedicine portfolio. You are not expected to complete this report in the practical period, so please take the time to discuss outputs with your classmates as you work through the practical together.
Table 1. provides a reference of lineages of interest or concern that you should consider looking at in the scanner outputs. Please note, all scanner outputs utlise the pangolin lineage nomenclature, but feel free to call these be either WHO, pangolin lineage nomenclature or VOC/VUI designation as you discuss.
We first look at growth rates for all clusters in the most recent scanner tool run, where each point represents a different cluster in the tree. This scanner tool run includes sequences between Feb 15th and May 17th. The x axis shows the date of last sample from each cluster on the plot. The y axis represents logistic growth rate.
Although there are quite a few clusters on the image, notice that we can toggle clusters with specific lineages on and off the figure by clicking on a lineage in the legend. Double click allows you to isolate only that specific lineage. Although clusters can (and generally do) contain more than one lineage, we identify which lineage is most prominent in a cluster and label a cluster as “lineage +”. Note in the legend, colour of a point represents dominant lineage in a cluster and size of a point represents cluster size.
Notice the graph has lots of functional features, including zooming in and selecting specific regions of the plot. For example, use box select to look at only clusters with a most recent sample date between May 01 and May 17. You can download the plot as a png as well for further observation or use (for example if you decide to write a situation report beyond the practical), including or excluding lineages as you please before downloading.
Discussion prompt
Which clusters show the highest growth rates? What lineages do they include?
How recent are the sample dates for the top 10 highest growth clusters? Do we see clusters with higher growth rates but less recent samples? More specifically, what do we see happening with B.1.1.7 vs. B.1.617.2?
How large (number of sequences) are the clusters for B.1.617.2, B.1.617.2+ and B.1.1.7? How recent are the sample most recent sample date for the largest B.1.1.7 clusters? Where do we see the largest B.1.617.2 clusters? What do these findings suggest to you as an epidemiologist?
What non-pharmaceutical interventions (NPIs) are in place in the UK (remember the date is May 28, 2021), and how might this impact what is considered a high growth rate?
Next, we look at a selection of complete scanner runs (every scanner tool run between April 01 - May 17, 2021) and a selection of lineages of interest or concern over time, to see how growth rate estimates for these lineages have changed over time. It is important to consider how growth rates change over time, as it is possible early growth rate estimates for a newly introduced lineage could be biased due multiple introductions or importations into the UK population from the international reservoir.
Discussion prompt
How would we expect importations and/or multiple introductions to bias our growth rate estimates?
Consider when B.1.617.2 is thought to have been first introduced into the UK population (Early April). Do you think this could be biasing our current growth estimates from our most recent run (figure above)?
The figure below shows B.1.1.7, B.1.351, B.1.525, B.1.617.2 and C.36 scanner growth rate estimates from multiple scanner runs (40 different runs). Every time point on the y axis represents a scanner run, where the date is the most recent possible sample date within that scanner run.
The points at each time point represent every cluster of the selected lineages within that specific scanner run. The plot shows a total of 40 scanner runs, with only clusters with growth rate > 0 included in the figure. For each point, colour represents the primary lineage within the cluster and size represents cluster size.
Discussion prompt
What does this output suggests in terms of growth for different VOC/VUI shown over time? Remember to again consider things such as cluster size, most recent sample date, potential time of importations or introduction in the population, and non-pharmaceutical interventions that might have been in place at different times in the outbreak period shown.
What do the earlier smaller high growth B.1.617.2 clusters suggest?
As a reference point, during peak of the second wave (~ Jan 2021) when the UK was in lockdown, a relative growth rate per generation of 0.2 or greater was considerd to be a high growth rate. Often the top 5 growth clusters would land somewhere between 0.2 and 0.3. Referring back to the previous two figures, consider where the highest growth clusters are now in terms of growth rate for comparison. What does this suggest about the outbreak overall? Why is there such a difference in terms of “high” growth rates between lockdown and not lockdown periods?
If time allows, continue to the next section.
Returning to the outputs from our most recent scanner run, we can further look at growth over time and space, using geographically matched growth rate estimates (using a generalised additive model, GAM, combined with a Gaussian markov random field model). Don’t worry too much about the statistics, the main focus here is to identify where growth is occurring.
In this case, we use a region level designated in the UK as lower tier local authority (LTLA). By using the previously described predictive model to estimate spatial growth across smaller regions, we can help inform NHS test and trace (the UK’s testing and contact tracing service) where to increase testing initiatives, as well as help UKHSA identify regions of concern for transmission. Color represents relative growth per generation.
We look at the top 3 growth clusters for B.1.1.7 and B.1.617.2, respectively, excluding samples where the most recent sample date is earlier than 01/05/2021. Table 2 shows a recap of these clusters, including cluster size, most recent sample date, least recent sample date, and logistic growth rate from scanner output.
The figures below show estimated lineage frequency between LTLAs and over epidemiological weeks for each of the 6 clusters in Table 2, where epi week 1 is the first week of the year. Estimates are based on a single cluster of interest. Color represents lineage frequency within LTLAs. Note scale and epi week number vary by cluster.
Discussion prompt
Why would certain areas in the population have higher transmission than others? Which epidemiological factors do we need to consider when thinking about variation in transmission between regions?
Where are clusters of interest primarily occuring? Do specific lineages appear to be more localised? Do specific lineages appear to be very spread out? What are the conotations of a fast growing cluster appearing in multiple locations across the UK?
If you combined the clusters from each lineage respectively onto one figure, how wide spread would these B.1.1.7 vs. B.1.617.2 clusters appear by epi week 18 or 19?
Do these frequency maps give any suggestion as to where B.1.617.2 was likely to have been introduced at origin into the UK?
How could we use this information as policy makers and epidemiologists to make decisions about non pharmaceutical interventtion and other policy implementation?
When writing your situation report, consider which information you found most useful in assessing the present scanner outputs. Include a few figures or tables which clearly support your results from the scanning tool, in terms of clusters with high growth rates, and which clusters you think UKHSA should focus on monitoring more carefully (e.g. lineage type or specific clusters? regions of interest?). Be sure to discuss sampling methods, as well as time period from which samples were used in the scanner run, as this is key to understanding how randomly sampled the sequences were from a given population.
Supplementary section below, primarily for report write up but feel free to continue onwards if time allows.
We can further look at lineage frequency, geographically matched growth rate estimates (using a GAM) and number of regions where a cluster is present (lower tier local authority, LTLA). We match our comparison samples to our clusters based on time and LTLA.
We again look at the top 3 growth clusters for B.1.1.7 and B.1.617.2 respectively, excluding samples where the most recent sample date is earlier than 01/05/2021. Table 2, above, shows a recap of these clusters, including cluster size, most recent sample date, least recent sample date, and logistic growth rate from scanner output.
The figures below show the estimated and empirical sample frequency of the cluster groups and geographic range of time. Column A shows the frequency of cluster (blue line) estimated with a GAM and a Gaussian process model for changes in time (don’t worry too much about the stats behind this!). The shaded region shows the 95% confidence interval for the observation if sampling is binomial. Points show the empirical lineage frequency on each day with size of points related to number of samples.
Frequency of the cluster is estimated relative to subset of non-cluster samples from the same set of LTLAs as where the cluster has been observed. Thus, the estimate does not represent VUI/VOC prevalence in England as a whole, but rather in regions where the cluster is present. Column B reflects A but with logit transformation. Column C shows the number of LTLAs where the cluster has been observed (sampled at least once) over time. Note variation in sample period (x axises) and estimates (y axises).
Questions to consider and discuss with your classmates if time permitting:
Which clusters show the highest frequency and geo-matched growth? How does this compare to the scanner relative growth rate estimates as reported in Table 2 or in the first figure?
What could make a cluster appear to have a high growth rate in the scanner tool, even if it is only present in <10 LTLAs? How do we identify this in these scanner outputs, and what other metadata (if any) could we use to more accurately identify this?